Document or message security arrangements using a numerical hash function

ABSTRACT

A document or message is protected against forgery or repudiation by processing a selected part or parts of the text of the document or message to form a hash, usually of fewer characters than the selected part or parts of the text. The processing comprises retrieving numerical values which define the respective characters of the selected part or parts of the text and making a calculation using the numerical values of the successive characters. Preferably the hash is added to the text.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to arrangements for the protection of documents against forgery or repudiation. The invention also relates to arrangements for the protection of electronically transmitted messages against forgery or repudiation.

2. State of the Art

It is common nowadays to provide security to documents through the use of holograms, watermarks, personal signature, notary stamps and other physical means: these all increase the difficulty for making unauthorised imitations or changes; however, they all require physical inspection, often involving forensic equipment and expertise, in order to detect a counterfeit. It is also becoming increasingly necessary to provide security for electronically transmitted messages.

SUMMARY OF THE INVENTION

The present invention provides for the security of the text of a document or message by cryptographic techniques.

In accordance with the present invention, there is provided an apparatus which is arranged to process a selected part or selected parts of the text of a document or message to form a hash, the hash usually being of fewer characters than the selected part or parts of the text, the processing comprising retrieving numerical values which define the respective characters of the selected part or parts of the text and making a calculation using the numerical values of the successive characters.

The apparatus may be arranged to receive or create a text in electronic form, then process this text to derive the hash of the selected part or parts of the text. The apparatus may further be arranged to add the hash to the text: typically, the apparatus then outputs the text, with the added hash, either for printing as a document or for electronic transmission. Alternatively the apparatus may be arranged to output the text and the hash separately (or store one and output the other).

The practical value of the hash is that it is sensitive to any change or alteration in the selected part of the text from which it is derived: it is not feasible to make a desired alteration to that part of the text whilst preserving the same hash value.

The hash thus forms a cryptographic signature which makes forgery detectable on the basis of an assessment of the content of the text and without the need for any forensic examination of the document.

The hash algorithm is not applied to the whole text, only to a selected part, or to selected parts. The or each part is identified, or sealed, by predetermined characters or combinations or characters immediately preceding and immediately following it: for example, a series of tilde marks (^(˜)) may be used.

Preferably the numerical values of the respective characters of the selected text are their ASCII values: the characters preferably include all keystrokes (including space, return etc.); preferably the “alphabet” is restricted to all keystrokes having ASCII values in the range 32 to 125 inclusive and also including ASCII values for the “return”.

Preferably the processing is recursive, in that the calculation in respect of each character uses the result of the calculation made in respect of at least one previous character.

Preferably the calculations for the first several (e.g. 10) characters use successive ones of a set of initial variables: preferably the calculations for each subsequent character uses, instead of an initial variable, the result of the calculation in respect of a previous character.

Preferably each calculation also uses one of a predetermined set of prime numbers. Preferably each calculation uses an interim result to determine which of these prime numbers is used to complete the calculation.

Preferably the processing involves at least a second pass over the selected part or parts of the text: in other words, once the calculation for the last character is completed, a second series of successive calculations is carried out on the characters, typically starting with the first character, and using the results of the calculations of the first series.

At the end of the above-described processing, the hash is formed by taking selected digits from the results obtained in a final plurality of the calculations: for example the final two digits may be taken from each of the final 10 results, and a 20-digit hash formed by placing these 10 pairs of digits in a given order.

One form of hash algorithm used in the invention is an Objective Linguistic Hash (OLH). This is linguistic in that it “reads” letters, numbers and other keys commonly used in the preparation of documents. It is objective in that the hash value produced can be verified by anyone using the algorithm. The OLH algorithm produces a final number by acting recursively one character at a time throughout the length of the message.

The variability of the message far exceeds the variability of the final hash, so inevitably many different messages would have the same hash value. However, it is unfeasible to make a meaningful change to the message whilst retaining the same hash number.

It will be appreciated that the invention may be incorporated in a word processing apparatus. In this use, a document is created in electronic form on the apparatus, complete with the seal (e.g. series of tilde marks) at the beginning and end of the or each selected part of the text. A “sealing” command is then performed, whereupon the apparatus automatically processes the “sealed” part or parts of the text to create the hash, which is stored with the text. Subsequently, the document can be altered or corrected as necessary, then “re-sealed”, to process the sealed part or parts of the text again and create the hash afresh. Once the document is finalised, it can be printed out, complete with the hash.

The above-mentioned OLH algorithm may be modified to provide a Subjective Linguistic Hash (SLH). This differs from the OLH in that it is made subjective by being “seeded” with secret information known only to an accredited authority: thus, the processing of the selected or “sealed” part or parts of the text is carried out using secret initial variables. Preferably use is made of a seed, in the form of a very large secret number (typically having 50 to 200 digits) known as the Secret Primitive (SP). An algorithm is run, using the SP, to produce the initial variables: preferably this algorithm also uses a number of items of open information, known as Open Primitives (OP's), contained in the document or message being protected. The SLH algorithm may produce a plain hash initially, then encrypt this using the SP as secret key: this preserves the secrecy of the plain hash.

A further algorithm which can be used in accordance with the invention is a Subjective Encrypted Hash (SEH) algorithm. This involves encrypting an OLH hash, using secret primitive values known only to a witnessing party, together with open primitive values such as date and time. In this case, the witnessing party uses an apparatus into which the OLH of a document or message is keyed, together with the open primitive values, and which encrypts the OLH using the SEH algorithm, to create the SEH hash which is preferably printed on the document, or on a label for application to the document. Preferably the apparatus stores the initial OLH and the final SEH, together with the open primitive values.

Embodiments of the present invention will now be described by way of examples only and with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of an apparatus in accordance with the invention;

FIG. 2 sets out an example of a document text to be processed;

FIG. 3 gives an example of a set of initial variables to be used in the processing algorithm;

FIG. 4 is a table detailing the successive steps in applying the processing algorithm to the document text of FIG. 1;

FIG. 5 is a table detailing the successive steps in applying the processing algorithm in a second pass to the document text; and

FIG. 6 sets out the final 20-digit hash which is created.

Referring to FIG. 1, an apparatus in accordance with the present invention comprises a microprocessor 10 having connected to it a read-only memory (ROM) 12, a memory 14 for predetermined values, and a data store 16. The ROM 12 holds a hash algorithm and a memory 14 holds a set of initial variables and additionally a set of 64 prime numbers each of 5 digits) and also three prime numbers (preferably the prime numbers 37, 17, and 7). The apparatus has an input port 18 coupled via a buffer 20 to the data store 16. The message or text to be processed may be received in electronic form on the input 18, or it may be already stored in the data store 16. The microprocessor processes the messages or text in accordance with the algorithm held in the ROM 12 and using the predetermined values held in the memory 14, in the manner which will be described below: the partial results of the hash calculation are written to and read from a results store 22. Finally, the calculated hash is added to the electronic text in the data store 16. The apparatus has a data output port 24, through which the message text complete with its hash can be sent from the data store 16, whether to a printer 26 or a transmission modem 28 or other computer peripheral device.

FIGS. 2 to 6 provide a worked example which uses an OLH algorithm on a selected part of the text (FIG. 2) of a document, typically a word processed document, namely the part between the two series of five tilde marks (˜ ˜ ˜ ˜ ˜). The hash algorithm uses a set of initial values or variables (IV's), in this example 2, 4, 8, 16, 32, 64, 128, 256, 512 and 1024 (FIG. 3): the algorithm additionally uses a set of 64 prime numbers (each of 5 digits) used as modulators and also three prime numbers, preferably 37, 17 and 7, as will be shown below; the OLH algorithm is stored in the ROM 12 of FIG. 1 and the initial variables and the prime numbers just referred to are stored in memory 14. The tables of FIGS. 4 and 5 show the processing carried out to create a 20-digit hash. The following description, referring firstly to FIG. 4, shows the manner in which the calculations proceed, taking for example the 16th row. It will be noted that the part of the message of FIG. 2 to be processed is set out character-by-character in the first column of the table of FIG. 4: the rows are numbered 0 to 9 cyclically (starting with 1) in the second column; the initial variables of FIG. 3 are used in turn in the 5th column, for the first 10 rows.

Thus, reading across the 16th row in the table of FIG. 4, we have:

s=the input character

6=the “y” value, i.e. the choice of recursive P(y)

115=the ASCII value of “S”

25149=the value of the result on the preceding row, namely P(5)=25149

16002=the last value of P(y), i.e. P(6) (found in the last column ten rows earlier).

41266=115+25149+16002 (the sum of the values in the three preceding columns in the same row)

16=the value of n, where n is the ordinal of the character in the text

5405846=41266×(115+16) (i.e., the sum of the ASCII value of “s” plus the value of “n” times the sum obtained two columns to the left in the same row).

37=the value of Z =(37×115+17×16+7×25149+30539) mod 64 (where the prime numbers 37, 17, and 7 are multiplied respectively with the ASCII value of “s”, the value of “n”, the value of the result of the preceding row P(5), and the “30539” is the value “m” of the preceding row)

27299=“m ” the value of the 37th 240 th of the set of 64 5-digit prime numbers

644=5405846 (the value abtgained in the same row three columns earlier) mod 27299 (i,e mod “m”)

It will be noted that the calculation on each row in the table of FIG. 4 is recursive, in that it uses results produced on previous rows (see the 4th and 5th items in each row). Further, in the example shown, the algorithm makes a second pass over the sealed part of the text: the successive calculations of this are set out in the table of FIG. 5. Finally, the 20-digit OLH hash is produced by selecting the final two digits of the results (final column) of the final 10 rows, placed in the order of recursive p(y)=0 to 9: this hash is set out in FIG. 6.

Any attempt to alter the sealed part of the text, whilst retaining the same hash value, would require subsequent alterations in all further recursive steps to the end of that text. This is inherently difficult, but made more so by continuing the recursion back to the beginning of the sealed text for the second pass: a third pass may additionally be made.

The above-described OLH algorithm may be modified to form a Subjective Linguistic Flash (SLH). The SLH differs from the OLH in that it is made subjective by being “seeded” with secret information known only to an accredited authority: the initial values (IV's) are therefore secret. Preferably the seed is a very large secret number, typically with 50 to 200 digits, and known as the Secret Primitive (SP), known only to the issuing authority. The SLH algorithm “fuses” the SP with open information, known as Open Primitives (OP's), contained in the document or message, to produce the initial variables (IV's). Preferably the algorithm produces a “plain hash” in the first instance, which is then doubly encrypted using the SP as secret key. This preserves the secrecy of the plain hash and makes it mathematically unfeasible to work backwards through the document to discover the primitives.

A further algorithm which can be used is a Subjective Encrypted Hash (SEH). The SEH involves encrypting an OLH hash. The encryption incorporates secret primitive (SP) values known only to a witnessing party, and open primitive (OP) values such as date and time or other non-repeating factors. Further, the encryption is one-way, because the OLH is also fused with the OP and SP values. Since the key is therefore part of the message, the crypt cannot be reversed by application of the key. Every output value of a fixed OLH is therefore distinct, due to non-repeating elements in the OP's.

A number of possible applications of the invention will now be described by way of examples only.

A first application of the invention is for preventing fraudulent alteration of a Vehicle Registration Document. It is well known that stolen or redundant Vehicle Registration Documents have a value in the process of “ringing”, that is, altering the identity of a stolen car. To complete the fraud, a plausible Vehicle Registration Document is required. In a first case, the ringer will have to make a forged alteration to the document, for example, to cover a re-spray in a different colour. In a second case, if the ringer can alter the identity of a car, exactly to match the Vehicle Registration Document, then the fraud is undetectable to an unsuspecting buyer.

The present invention can prevent fraud in either case in the following way. When a vehicle is insured, the important fixed elements of the particular information concerning the vehicle and its keeper form the message parts of the hash algorithm. The secret primitives are in the possession of the insurer of the vehicle.

Example of message parts are:

Owner, Keeper, Registration Number, Make, Model, Colour, Chassis Number, Engine number.

These parts are impossible to alter in a fraudulent way, without knowledge of the corresponding altered value of the SLH. Thus an SLH hash marked on the Registration Document protects against the first case of fraud. To solve the problem of the second case, OP's are added as follows:

Insurance Renewal Date, Mileage on last insurance, Stated value on last insurance.

These OP's have to be altered in the second case to give a vehicle a new false history. It is not possible for a ringer to do this because the true history is protected by earlier SLH's.

The SP for a given Insurance Company or other authority would preferably be a very large number, typically of 50 to 200 digits. It is preferably that the insurance company produces an updated SLH each year, using details of the vehicle and its keeper held or added to its stored record for that client, and including the vehicle mileage: the SLH may then be printed.

In a variation applicable to a vehicle registration document, the insurance company may produce an SLH each year, using details of the vehicle and its keeper, including the vehicle mileage: the SLH is then printed on a sticker, together with open information of the vehicle (e.g. mileage, value of the vehicle) for the keeper to stick on the vehicle registration document. Each time the insurance is renewed, an additional such sticker is created for the keeper to add to the registration document. It will be appreciated that the registration document will thus include, in respect of each renewal, a hash related to data printed in selected parts or fields of the document.

A second application of the invention is relevant to high value tickets, bought in advance where there is high risk of fraud. This form of fraud is rife for example in the sale of tickets for long-awaited pop concerts where the forged tickets are sold to young people in a social context where they are likely to be susceptible. Nothing can prevent a buyer from purchasing a ticket where there is no ready means of verifying its data, but with a suitable warning this application of the invention exerts psychological pressure due to the uncertainty that a ticket bought from an unofficial source will be valid on the day of the concert. A suitable warning might read as follows:

Warning: If you have bought this ticket from an unauthorised source, it may be a perfect forgery. Only genuine tickets will pass the electronic test at the turnstile. Do not run the risk of being turned away.

Each event is given an SP which is available as an input to the software used at legitimate outlets. This SP is only released to points of entry to the concert immediately before the crowds start to appear. The point of entry has a machine for reading the hash from the ticket: the hash may be printed on the ticket, at the time of issue, in both human-readable and machine-readable form. The OP is a combination of the date and time of sale, correct to the nearest second, and the name of the buyer. The SLH is also printed on the ticket. Even if the fraudster prints a very recent time and date, it is mathematically unfeasible to calculate the appropriate SLH, so he has a hazardous task of persuading the buyer that he/she must attach no significance to the lapse of time. Further, the buyer who reads the warning on the reverse side of the ticket is put under the psychological pressure of having to wait for the concert itself before knowing whether the ticket is valid.

A third application of the invention is relevant to National Identity Cards which display a photograph and personal details of the legitimate owner. The invention provides for a massive SP (containing at least 400 figures) held in a tamper proof location. The printed matter of the card is classified either as message parts to be hashed or OP's. The SLH is printed on the face of the card as additional information. This prevents alteration of a card or the printing of a false identity.

A fourth application of the invention is the use of a Trusted Third Party such as an accredited Notary Public to provide an SEH supplied with a pre-calculated OLH for a “sealed” part of a document. The document itself may either be sent in plain or in crypt. The function of the notary is to use the OLH to calculate the SEH. The document may be processed to provide it with a double header, the OLH and the SEH which incorporates a date/time stamp. In the event of a dispute both “versions” of the disputed text can be tested by an OLH, but only the valid OLH will have the proper SEH. 

What is claimed is:
 1. An apparatus which is arranged to process a selected part or selected parts of the text of a document or message to form a hash, the hash being of fewer characters than the selected part or parts of the text, the processing comprising retrieving numerical values which define the respective characters of the selected part or parts of the text and making a calculation using the numerical values of the successive characters of said selected part or parts of the text, the calculation made in respect of each said character using the result of the calculation made in respect of at least one previous character and also using one of a predetermined set of prime numbers, and each said calculation using an interim result to determine which of said set of prime numbers is used to continue that calculation.
 2. An apparatus as claimed in claim 1, which is arranged to receive or create said text in electronic form, then process said text to derive said hash.
 3. An apparatus as claimed in claim 2, arranged to add said hash to said text.
 4. An apparatus as claimed in claim 3, arranged to output text, with the added hash.
 5. An apparatus as claimed in claim 2, arranged to output said text and its hash separately, or to store one and output the other.
 6. An apparatus as claimed in claim 1, arranged for the or each said part of said text to be identified by predetermined characters or combinations of characters immediately preceding and following it.
 7. An apparatus as claimed in claim 6, in which each said identifier comprises a series of tilde marks.
 8. An apparatus as claimed in claim 1, in which said numerical values of the respective characters of the selected text are their ASCII values.
 9. An apparatus as claimed in claim 8, in which an alphabet which includes all said characters is restricted to all keystrokes having ASCII values in the range 32 to 125 inclusive.
 10. An apparatus as claimed in claim 1, arranged so that the calculations made for a first plurality of characters use successive ones of a set of initial variables.
 11. An apparatus as claimed in claim 1, arranged so that said processing involves at least a second pass over the selected part or parts of said text.
 12. An apparatus as claimed in claim 1, arranged so that at the end of said processing, the hash is formed by taking selected digits from the results obtained in a final plurality of said calculation.
 13. An apparatus as claimed in claim 1, arranged such that said hash is seeded with secret information.
 14. An apparatus as claimed in claim 13, arranged such that said processing is carried out using secret initial variables.
 15. An apparatus as claimed in claim 1, arranged to encrypt said hash.
 16. An apparatus as claimed in claim 15, arranged to store said hash and the encrypted hash formed from it.
 17. A process of forming a hash from a selected part or from selected parts of the text of a document or message, the process comprising retrieving numerical values which define the respective characters of the selected part or parts of said text and making a calculation using the numerical values of the successive characters of said selected part or parts of the text, the calculation in respect of each said character using the result of the calculation made in respect of at least one previous character and also using one of a predetermined set of prime numbers, wherein each said calculation uses an interim result to determine which of said set of prime numbers is used to continue the calculation, said hash being of fewer characters than said selected part or parts of the text.
 18. A process as claimed in claim 17, in which the calculations made for a first plurality of said characters also use successive ones of a set of initial variables. 